In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.
Visualize the German Traffic Signs Dataset. This is open ended, some suggestions include: plotting traffic signs images, plotting the count of each sign, etc. Be creative!
The pickled data is a dictionary with 4 key/value pairs:
# Load pickled data
import pickle
# TODO: fill this in based on where you saved the training and testing data
training_file = 'train.p'
testing_file = 'test.p'
with open(training_file, mode='rb') as f:
train = pickle.load(f)
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
X_train.shape, X_test.shape
from sklearn.model_selection import train_test_split
import pandas as pd
label_description = pd.read_csv("signnames.csv")
def get_label_name(label, df=label_description):
return df[df['ClassId'] == label]['SignName'].values[0]
label_description['SignName']
### To start off let's do a basic data summary.
# TODO: number of training examples
n_train = X_train.shape[0]
# TODO: number of testing examples
n_test = X_test.shape[0]
# TODO: what's the shape of an image?
image_shape = X_train[0].shape
# TODO: how many classes are in the dataset
n_classes = label_description.shape[0]
print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
### Data exploration visualization goes here.
### Feel free to use as many code cells as needed.
import numpy as np
from matplotlib import pyplot as plt
The number of training data are unevenly distributed across clasess. This may leads to the underpresented class been overlooked in the model
plt.hist(y_train, bins=n_classes)
plt.show()
y_train_df = pd.DataFrame(y_train, columns="ClassId".split())
y_train_df = pd.merge(y_train_df, label_description, on="ClassId", how="left")
pd.DataFrame(y_train_df.groupby("SignName").count()).sort_values('ClassId', ascending=False)
image_indexed_by_label = {label: np.nonzero(y_train == label)[0] for label in range(n_classes)}
# label number starts from 0 to 42
starts = 20
ends = 30
cols = 10
rows = ends - starts
f, axarr = plt.subplots(rows, cols)
# f.tight_layout(pad=1.0)
f.set_size_inches(w=15, h=20)
for label in range(starts, ends):
for p, idx in enumerate(np.random.choice(image_indexed_by_label[label], size=cols, replace=False)):
row = label % rows
a = axarr[row, p]
if p == 0:
label_name = get_label_name(label)
a.set_ylabel(label_name)
a.imshow(X_train[idx])
a.grid(b=False)
a.set_xticklabels([])
a.set_yticklabels([])
plt.show()
Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.
There are various aspects to consider when thinking about this problem:
Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
### Preprocess the data here.
### Feel free to use as many code cells as needed.
Describe the techniques used to preprocess the data.
Answer:
Supposely normalize the data to -0.5 ~ 0.5 range will have benifits like numerical stability
I use code in the following cell for the first few attempts, but the highest accuracy I got happened to be without this normalization. (may due to the difference in parameters and layer changes)
So as far as I see it has not so obvious effect on final result.
# did not actually use this normalization
# X_train = X_train / 255 - 0.5
# X_test = X_test / 255 - 0.5
### Generate data additional (if you want to!)
### and split the data into training/validation/testing sets here.
### Feel free to use as many code cells as needed.
"""
Make copy and random transformation to training data
thus make the label distribution more balanced
"""
def resample_and_augumentation(X_train, y_train, target=4000, mean=.85, std=.01, aug="affine"):
"""
resample the under-represented classes
by set a higher factor value
"""
number_of_class = np.unique(y_train).shape[0]
image_index_by_label = {label: np.flatnonzero(y_train == label) for label in range(number_of_class)}
new_images = []
new_labels = []
for label, image_indices in tqdm_notebook(image_index_by_label.items()):
image_count = len(image_indices)
number_of_images_to_generate = target - image_count
if number_of_images_to_generate <= 0:
continue
# draw the original images we want to transform, with replacement
original_image_indices = np.random.choice(image_indices, size=number_of_images_to_generate, replace=True)
for original_image_idx in original_image_indices:
original_image = X_train[original_image_idx]
if aug == "affine":
transformed_image = random_affine_transform(original_image, 20, 10, 5)
elif aug == "perspective":
transformed_image = perspective_transform(original_image)
else:
transformed_image = original_image
new_images.append(transformed_image)
new_labels.append(y_train[original_image_idx])
# print(y_train.shape, len(new_labels))
new_labels = np.concatenate((y_train, np.array(new_labels)))
new_images = np.concatenate((X_train, new_images))
print("data shape after augumentation. X: %s, y: %s" % (new_images.shape, new_labels.shape))
return new_images, new_labels
def perspective_transform(original_image, dtype=np.float16):
rows,cols,ch = original_image.shape
origial_pts = np.float32([[0,0],[rows,0],
[0,cols],[rows,cols]])
shrink = np.random.normal(mean, std)
if np.random.uniform() > 0.5:
dst_pts = np.float32([[0, cols * (1 - shrink)], [rows, 0],
[0, cols * shrink], [rows, cols]])
else:
dst_pts = np.float32([[0, 0], [rows, cols * (1 - shrink)],
[0, cols], [rows, cols * shrink]])
M = cv2.getPerspectiveTransform(pts1,dpts)
dst = cv2.warpPerspective(original_image, M, (rows,cols),dst_pts,
cv2.INTER_LINEAR+cv2.WARP_FILL_OUTLIERS, cv2.BORDER_REPLICATE).astype(dtype)
return dst
"""
'borrowed'from
https://carnd-udacity.atlassian.net/wiki/cq/viewquestion.action?id=10322627&questionTitle=project-2-unbalanced-data-generating-additional-data-by-jittering-the-original-image
"""
def random_affine_transform(img, ang_range, shear_range, trans_range):
'''
This function transforms images to generate new images.
The function takes in following arguments,
1- Image
2- ang_range: Range of angles for rotation
3- shear_range: Range of values to apply affine transform to
4- trans_range: Range of values to apply translations over.
A Random uniform distribution is used to generate different parameters for transformation
'''
# Rotation
ang_rot = np.random.uniform(ang_range)-ang_range/2
rows,cols,ch = img.shape
Rot_M = cv2.getRotationMatrix2D((cols/2,rows/2),ang_rot,1)
# Translation
tr_x = trans_range*np.random.uniform()-trans_range/2
tr_y = trans_range*np.random.uniform()-trans_range/2
Trans_M = np.float32([[1,0,tr_x],[0,1,tr_y]])
# Shear
pts1 = np.float32([[5,5],[20,5],[5,20]])
pt1 = 5+shear_range*np.random.uniform()-shear_range/2
pt2 = 20+shear_range*np.random.uniform()-shear_range/2
pts2 = np.float32([[pt1,5],[pt2,pt1],[5,pt2]])
shear_M = cv2.getAffineTransform(pts1,pts2)
img = cv2.warpAffine(img,Rot_M,(cols,rows))
img = cv2.warpAffine(img,Trans_M,(cols,rows))
img = cv2.warpAffine(img,shear_M,(cols,rows))
return img
Describe how you set up the training, validation and testing data for your model. If you generated additional data, why?
Answer:
Each picture of traffic signs is of different angles and scales, to make to model more general, I initially used "perspecitve projection" to generate several variated copies of a same sign. I noticed that other student used affine transformation for the same purpose. I found that this approche leads to higher accuracy on test data and thus use his procedure instead.
The labels in the training data are unevenly distributed, largest label classes are 10x more than the smallest ones. and it turned out those label class with fewest training examples almost always suffers from a low accuracy on test data.
Based on the above reasons I generated additional data like this:
What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.
import numpy as np
import keras
import matplotlib.pyplot as plt
import matplotlib.image as mpimage
from keras.utils import np_utils
import pickle
from tqdm import tqdm_notebook
import cv2
import itertools
from keras.layers.core import Dense, Dropout, Flatten, Activation
from keras.layers.convolutional import Convolution2D
from keras.layers.pooling import MaxPooling2D
from keras.optimizers import SGD, Adam, RMSprop
from sklearn.model_selection import train_test_split
number_of_class = len(np.unique(y_train))
model1 = keras.models.Sequential()
model1.add(Convolution2D(32, 2, 2, activation='relu', input_shape=(32, 32, 3), border_mode='valid', bias=True))
model1.add(Convolution2D(32, 3, 3, activation='relu', border_mode='valid', bias=True))
model1.add(MaxPooling2D(pool_size=(2, 2), border_mode='valid', dim_ordering='default', strides=(2,2)))
model1.add(Convolution2D(32, 2, 2, activation='relu', border_mode='valid', bias=True))
model1.add(Convolution2D(32, 3, 3, activation='relu', border_mode='valid', bias=True))
model1.add(MaxPooling2D(pool_size=(2, 2), border_mode='valid', dim_ordering='default', strides=(2,2)))
model1.add(Dropout(0.5))
model1.add(Convolution2D(48, 3, 3, activation='relu', border_mode='valid', bias=True))
model1.add(Flatten())
model1.add(Dense(108, name='hidden1', activation='relu'))
model1.add(Dense(number_of_class, name='output', activation='softmax'))
model1.compile(loss='categorical_crossentropy',
optimizer=Adam(decay=0.),
metrics=['accuracy'])
model1.summary()
It's until late that I realized I was supposed to build CNN directly using Tensorflow.
Due to I jumped to Keras lesson before doing the project, plus I missed the requirement in the project intro that Tensorflow is the tool to use.
I used Keras to build my model instead.Since it seems the project is more focus on modeling and data pre-processing rather than building underlining computation structures I still submitted with code in Keras
Answer:
How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)
Answer:
I came up with the following train_with_data function, which I found handy in controlling and tunning the training process.
with each call of train_with_data, a new batch of augumented images were generated for training, while the original copies are always put into the traninig set. So in each round of training, the information carried by original picures is always been taken into account.
Between each call to the function, I may choose to decrease the learning rate, in order to get through stagnation of losses.
I choose the Adam optimizer with default batch_size of 128, I started with default learning rate. After some epoches I decrease the learning rate.
def train_with_data(model, X, y, augument, test_size=0.25, seed=42, batch_size=128, nb_epoch=5, aug_factor=2):
"""
augument the data and then train
"""
X_train_split, X_val_split, \
y_train_split, y_val_split = train_test_split(X, y,
test_size=test_size, random_state=seed)
print("train validation split. train:%s | validation: %s | ratio: %f" % (len(X_train_split), len(X_val_split), test_size))
target = int(np.bincount(y).max() * aug_factor)
print("augumentation target: %d" % target, " factor: %0.1f" % aug_factor)
X_train_split_aug, y_train_split_aug = augument(X_train_split, y_train_split, target=target)
history = model.fit(X_train_split_aug, np_utils.to_categorical(y_train_split_aug),
batch_size=batch_size, nb_epoch=nb_epoch,
validation_data=(X_val_split, np_utils.to_categorical(y_val_split)),
verbose=1)
return history
# if you want skip training, I have saved the model and weights
model1 = keras.models.load_model("model1.h5")
# print("phase 1, lr=", model1.optimizer.get_config()['lr'])
# train_with_data(model1, X_train, y_train, resample_and_augumentation, nb_epoch=2, aug_factor=2)
# train_with_data(model1, X_train, y_train, resample_and_augumentation, nb_epoch=3, aug_factor=2)
# model1.optimizer.lr /= 10
# print("phase 2, lr=", model1.optimizer.get_config()['lr'])
# train_with_data(model1, X_train, y_train, resample_and_augumentation, nb_epoch=2, aug_factor=3)
# will reach little bit more than 0.95 of accracy and 0.27 of losses at end of "phase 2"
model1.evaluate(X_test, np_utils.to_categorical(y_test))
model1.save("model1.h5")
# train more
# model1.optimizer.lr /= 10
# print("phase 3, lr=", model1.optimizer.get_config()['lr'])
# history = train_with_data(model1, X_train, y_train, resample_and_augumentation, nb_epoch=2, aug_factor=2.5)
model1.optimizer.lr /= 10
print("phase 4, lr=", model1.optimizer.get_config()['lr'])
history = train_with_data(model1, X_train, y_train, resample_and_augumentation, nb_epoch=2, aug_factor=2.5)
# 0.955 seems the highest accuracy one could get using this model
model1.evaluate(X_test, np_utils.to_categorical(y_test))
Randomly check some predictions result
plot_prediction_prob(model1, X_test[np.random.randint(0, len(X_test), size=10)], k=5)
evaluation_predictions = model1.predict(X_test)
correct_predictions = np.array(np.argmax(evaluation_predictions, axis=1) == y_test)
wrong_predictions = np.array(np.argmax(evaluation_predictions, axis=1) != y_test)
f, axies = plt.subplots(2,1)
f.set_size_inches((7,5))
axies[0].set_title("Correct Predictions")
axies[1].set_title("Wrong Predictions")
axies[0].hist(np.sort(evaluation_predictions[correct_predictions], axis=1)[:,-1], bins=20)
axies[1].hist(np.sort(evaluation_predictions[wrong_predictions], axis=1)[:,-1], bins=20)
plt.show()
Finding: The model is pretty sure about those correct predictions and not so sure about those wrong predictions
What approach did you take in coming up with a solution to this problem?
Answer:
My early efforts are spent in understanding the network architect in Sermanet and LeCun's paper, but I don't know how to interpret the parameters descriped in the paper thus cannot follow the exact setup of its network.
Then I followed the layers pattern section on Stanford CS231n course and built a layer structure simlilar to the current one: [conv2d->maxpooling] * 2 -> dropout -> conv2d -> fc -> classifier I trained with 40 epoch and reached 91% accuracy on test data but no more. I'm neither satisfied with the fianl result nor the relatively long training time.
I want to improve the model, several attempts are made (lot of time spent on training and comparing the result). Then finally come to the pattern: [conv2d * 2 -> maxpooling] * 2 -> dropout -> conv2d -> fc -> classifier. I found that with seemingly similar layer setup, a small change (say, change strips from (2,2) to (1,1)) in layer parameters greatly affect the total number of parameters. and the network will become slower to train and more likely to overfit. So the guideline that leads me to the final solution was to increase the number of layers but carefully sets the so that the number of weights do not "explode".
from sklearn.metrics import confusion_matrix
def plot_confusion_matrix(cm, classes,
normalize=False,
size=(10, 10),
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
plt.figure(figsize=size)
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')
# print(cm)
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
if cm[i, j] in (0.0, 0):
continue
plt.text(j, i, "%.2f" % cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()
def confusion_matrix_for_model(model, X, y, labels=43, size=(20,20), accuracy=True):
def get_accuracy(cm, y):
for label in range(labels):
cm[label] = cm[label] / (y == label).sum()
return cm
predictions = model.predict_classes(X)
print("accuracy: ", (predictions == y).sum() / y.shape[0])
cm = confusion_matrix(y, predictions)
if accuracy:
cm = get_accuracy(cm.astype(np.float16), y)
plot_confusion_matrix(cm, range(labels), size=size)
confusion_matrix_for_model(model1, X_test, y_test)
From the confusion matrix above, we sees that test cases with ClassId 21 and 27 have significant lower accuracy rates than others, they are very likely to be mis-classified into label 31 and 21 respectively.
Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.
You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It would be helpful to plot the images in the notebook.
I live in Germany so I go around and captured some traffic signs near my place, but found afterwards that only part of their labels are in the training set, thus there's no chance to do the correct predictions. But it's interesting to see the result.
As we can see in a more comprehensive list of Germen Traffic Signs, The prediction in a real setting can be more challenge, since there are many similar traffic signs, or signs only make sense when appearing together.
fnames = ["rIMG_5818.jpg", "web03.jpg", "web10.jpg", "rIMG_5835.jpg", "rIMG_5857.jpg", "rIMG_5836.jpg"]
notes = ["mis-leading background",
"covered by snow",
"printed on ground",
"angled+backlight",
"dark",
"two signs"]
f, axies = plt.subplots(1, len(fnames))
f.set_size_inches((15, 5))
for a, fname, note in zip(axies, fnames, notes):
img = plt.imread("my_pic/%s" % fname)
a.imshow(img)
a.set_title(note)
a.set_xticklabels([])
a.set_yticklabels([])
plt.show()
Is your model able to perform equally well on captured pictures or a live camera stream when compared to testing on the dataset?
Answer:
If using my model in a camera stream scenario:
Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)
Answer:
I use the following code to plot the traffic sign along with certainties of the predictions, the following are some observation:
k = 5
def plot_prediction_prob(model, images, k, figure_size=(8,20)):
predict_probs = model.predict_proba(images)
topk_arr = np.argsort(predict_probs, axis=1)[:,-k:]
# predictions = np.array([l in topk for l, topk in zip(correct_label, topk_arr)])
topk_probs_and_labels = [ (probs[labels], [get_label_name(l) for l in labels] ) for probs, labels in zip(predict_probs, topk_arr)]
f, axies = plt.subplots(len(topk_probs_and_labels), 2)
f.set_size_inches(figure_size)
for idx in range(len(topk_probs_and_labels)):
probs, labels = topk_probs_and_labels[idx]
left, right = axies[idx]
right.set_xlim((0,1))
left.set_xticklabels([])
left.set_yticklabels([])
left.set_xticks([])
left.set_yticks([])
left.imshow(images[idx])
length = range(len(probs))
right.tick_params(labelright=True, labelleft=False)
right.barh(length, probs, tick_label=labels, height=0.3)
plt.show()
fnames = ["rIMG_5818.jpg", "web03.jpg", "web10.jpg", "rIMG_5835.jpg", "rIMG_5857.jpg", "rIMG_5836.jpg"]
my_images = np.array([ plt.imread("my_pic/%s" % fname) for fname in fnames ], dtype=np.int32)
plot_prediction_prob(model1, my_images, 5, figure_size=(8,8))
fnames = """rIMG_5828.jpg rIMG_5829.jpg rIMG_5831.jpg rIMG_5832.jpg
rIMG_5833.jpg rIMG_5834.jpg rIMG_5835.jpg
rIMG_5836.jpg rIMG_5845.jpg rIMG_5857.jpg
r-0.jpg r-1.jpg""".split()
my_images = np.array([ plt.imread("my_pic/%s" % fname) for fname in fnames ], dtype=np.int32)
plot_prediction_prob(model1, my_images, 5)
fnames = "web01.jpg web02.jpg web03.jpg web04.jpg web05.jpg web06.jpg web07.jpg web08.jpg web09.jpg web10.jpg".split()
my_images = np.array([ plt.imread("my_pic/%s" % fname) for fname in fnames ], dtype=np.int32)
plot_prediction_prob(model1, my_images, 5)
fnames = """rIMG_5806.jpg rIMG_5808.jpg rIMG_5811.jpg rIMG_5818.jpg rIMG_5821.jpg rIMG_5824.jpg
rIMG_5807.jpg rIMG_5810.jpg rIMG_5812.jpg rIMG_5820.jpg rIMG_5822.jpg rIMG_5826.jpg""".split()
my_images = np.array([ plt.imread("my_pic/%s" % fname) for fname in fnames ], dtype=np.int32)
plot_prediction_prob(model1, my_images, 5)
If necessary, provide documentation for how an interface was built for your model to load and classify newly-acquired images.
Answer:
Since I'm using Keras, this could be done by simply calling:
model1.predict_classes(X_input)
Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.